knitr::opts_chunk$set(
warning = FALSE,
message = FALSE,
echo = FALSE  # Show code in the output
)

Spatial Data Analysis Report: Xenium Human Breast Cancer Dataset

Independent Project of analysis workflow performed to study the spatial transcriptomics of a human breast cancer dataset generated using the Xenium platform. The analysis follows standard steps like:

Key Findings

The analysis found distinct cell populations, with invasive carcinoma cells concentrated in specific areas, potentially indicating tumor boundaries. FAS expression was notably high in these invasive regions, surprising given its role in tumor metabolism, while CEACAM6 marked ductal carcinoma areas. Immune cells were scattered, suggesting infiltration into the tumor.

Introduction

The Xenium platform is a spatial transcriptomics technology that allows for the simultaneous measurement of gene expression and spatial location of cells in a tissue section. The dataset contains gene expression data from thousands of cells, as well as spatial information about the location of each cell in the tissue section. In this analysis, we will conduct a comprehensive analysis of the Dataset to identify cell types, spatial patterns, and marker genes associated with breast cancer.

Data Source:: https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast

Loading Packages

Loading Data

Data Preprocessing / Quality Control

Initial preprocessing removes low-quality cells and visualizes key metrics:

  • Remove cells with zero counts: Ensures only cells with detectable transcripts are analyzed.
  • Visualize distributions: Violin plots display genes per cell (nFeature_Xenium) and transcript counts per cell (nCount_Xenium).
  • Filtering: Cells are subsetted to retain those with 5–200 features and 10–1000 counts, reducing noise from low-quality cells or outliers.

Normalization and Scaling

To correct for technical variations (e.g., sequencing depth), we apply SCTransform, a variance-stabilizing normalization method that accounts for gene expression dependencies on sequencing depth. note: The method also scales the data to account for differences in gene expression magnitude.

Dimensionality Reduction

Dimensionality reduction simplifies the dataset while preserving biologically relevant variation:

  • PCA: Computes the top 30 principal components.
  • UMAP: Projects the data into a 2D space for visualization.

Clustering

groupeing cells based on shared expression profiles:

  • Find Neighbors: Uses PCA-reduced data to identify cell neighbors.
  • Find Clusters: Applies a resolution of 0.2 to define clusters, visualized with a UMAP plot.
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 163779
## Number of edges: 5297889
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9488
## Number of communities: 11
## Elapsed time: 93 seconds

Cell Type Annotation

The above Clusters are annotated using differential expression analysis and known marker genes:

  • Find Markers: Identifies genes differentially expressed in each cluster.
  • Visualize Markers: Feature plots display expression of cell-type-specific genes.
  • Assign Identities: Clusters are renamed based on marker expression.

Marker Genes Used: - B cells: MS4A1, CD79A - Macrophages: ITGAX - T cells: CD3E, CD4, CD8A - NK cells: NKG7 - Mast cells: KIT - Endothelial cells: PECAM1 - Myoepithelial cells: KRT15 - Fibroblasts: LUM - Proliferating cells: MKI67 - Ductal carcinoma: CEACAM6 - Invasive tumor: FASN

Spatial Visualization

Spatial plots integrate gene expression with tissue coordinates:

  • Cluster/Cell Type Maps: Spatial plots of clusters and cell types using ImageDimPlot.
  • Gene Expression Maps: Gene expression maps for FAS and CEACAM6 using ImageFeaturePlot, with color scales (e.g., white to red for FAS, blue to red for CEACAM6).
  • Tumor Markers: Tumor-specific marker genes overlaid on spatial coordinates.

Results

Results

The analysis of the Xenium Human Breast Cancer Dataset provided key insights into cellular composition and spatial organization:

  1. Cell Type Identification
    • Clustering identified distinct populations, annotated as invasive and ductal carcinoma cells, myoepithelial cells, fibroblasts, macrophages, endothelial cells, T/NK cells, B cells, and mast cells.
    • Marker gene expression validated these annotations (e.g., FASN for invasive tumors [1], CEACAM6 for ductal carcinoma [2]). See the table below for key markers:
    Cell Type Marker Genes Reference
    Invasive Tumor FASN Swinnen et al., 2006 [1]
    Ductal Carcinoma CEACAM6 Blumenthal et al., 2007 [2]
    B Cells MS4A1, CD79A Standard markers
    Macrophages ITGAX Standard markers
    T/NK Cells CD3E, CD4, CD8A, NKG7 Standard markers
    Endothelial PECAM1 Standard markers
    Myoepithelial KRT15 Standard markers
    Fibroblasts LUM Standard markers
    Mast Cells KIT Standard markers
  2. Spatial Organization
    • The spatial plot (Spatial_TumorMarkers.png) revealed invasive carcinoma cells concentrated in peripheral regions, potentially marking tumor boundaries, while ductal carcinoma cells aligned with central ductal structures (see Figure 1).
    • Immune cells (macrophages, T/NK cells) were dispersed throughout the tissue, suggesting infiltration into the tumor microenvironment, consistent with immune surveillance roles [3].
    Figure 1: Spatial distribution of tumor markers CEACAM6 (ductal) and FASN (invasive)
    Figure 1: Spatial distribution of tumor markers CEACAM6 (ductal) and FASN (invasive)
  3. Gene Expression Patterns
    • FASN expression was elevated in invasive regions (Spatial_FASN.png), aligning with its role in lipid metabolism supporting tumor growth [1].
    • CEACAM6 marked ductal carcinoma areas (Spatial_CEACAM6.png), consistent with its association with epithelial-derived cancers [2].

These findings underscore the cellular diversity and spatial architecture of the breast cancer microenvironment, with implications for tumor progression and immune interactions.


Conclusion

This spatial transcriptomics analysis demonstrates the utility of the Xenium platform in dissecting the breast cancer tumor microenvironment. By combining gene expression with spatial data, I identified key cell types and their distributions, offering insights into tumor-immune interactions and potential therapeutic targets.The elevated FAS expression in invasive regions underscores its metabolic role, suggesting avenues for targeting lipid metabolism in cancer therapy. Future work could integrate additional datasets or functional assays to validate these findings and explore clinical implications.


References

  1. Swinnen, J. V., et al. (2006). “Fatty acid synthase drives the growth of prostate cancer cells.” Cancer Research, 66(8), 3814-3820.
  2. Blumenthal, R. D., et al. (2007). “Carcinoembryonic antigen (CEA) and CEACAM6 in cancer progression.” Cancer Biology & Therapy, 6(6), 831-837.
  3. Hanahan, D., & Weinberg, R. A. (2011). “Hallmarks of cancer: The next generation.” Cell, 144(5), 646-674.